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ABSTRACT 

We present the first results from a Bayesian analysis of the WMAP first 
year data using a Gibbs sampling technique. Using two independent, parallel 
supercomputer codes we analyze the WMAP Q, V and W bands. The analysis 
results in a full probabilistic description of the information the WMAP data set 
contains about the power spectrum and the all-sky map of the cosmic microwave 
background anisotropics. We present the complete probability distributions for 
each Ci including any non-Gaussianities of the power spectrum likelihood. While 
we find good overall agreement with the previously published WMAP spectrum, 
our analysis uncovers discrepancies in the power spectrum estimates at low £ 
multipoles. For example we claim the best-fit ACDM model is consistent with 
the C*2 inferred from our combined Q+V+W analysis with a 10% probability 
of an even larger theoretical Ci. Based on our exact analysis we can therefore 
attribute the "low quadrupole issue" to a statistical fluctuation. 

Subject headings: cosmic microwave background — cosmology: observations - 
methods: numerical 
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1. Introduction 



Cosmic Microwave Background (CMB) power spectrum estimation from large datasets 
such as the Wilkinson Microwave Anisotropy Probe ( WMAP) (Bennett et al. 2003a) presents 
a considerable computational challenge. With the exception of some specialized methods 
(Oh et al. 1999; Wandelt and Hansen 2003), obtaining the power spectrum and error bars 
for a generalized, large CMB dataset without introducing significant simplifications has, 
until now, been computationally impossible. However, a new numerical approach based 
on Gibbs sampling has been developed recently by Jewell, Levin, and Anderson (2004) 
and Wandelt, Larson, and Lakshminarayanan (2004) which offers the hope of overcoming 
these computational difficulties and allowing the analysis of large CMB datasets regardless 
of scanning strategy, noise characteristics and so on. In this letter we present our results 
from the application of this new method to the first year WMAP data. We developed two 
independent, parallel codes to achieve this and a detailed description of the implementation of 
these codes is given in a companion publication (Eriksen et al. 2004). The analysis results in 
a full, multivariate probabilistic description of the information the WMAP data set contains 
about the power spectrum and the all-sky map of the CMB anisotropies. For the purposes of 
this letter we limit ourselves to a presentation of the results at each angular scale £, leaving a 
full multivariate exploration of the WMAP likelihood for a future publication. In §2 we give 
a very brief overview of our method and in §3 we present on overview of the implementation 
of our codes. §4 contains our results, provides some interpretation and discusses what light 
our method can shed on some of the anomalies in the WMAP power spectrum that have 
been discussed in the literature (e.g. Hinshaw et al. (2003); Spergel et al. (2003); Efstathiou 
(2004); Slosar, Seljak, & Makarov (2004)). Our conclusions and future directions for this 
work are detailed in §5. 



2. Method Overview 

In this section we present only a very brief overview of our method and codes and refer 
the reader to Eriksen et al. (2004) for a detailed discussion of our implementations and Jewell 
et al. (2004) and Wandelt et al. (2004) for a discussion of the underlying principles. 

The main difference between our approach and previous attempts at power spectrum 
estimation is that rather than trying to solve the maximum likelihood problem directly, we 
sample from the probability density of the power spectrum Ce given the data d, P(Ce\d). 
While directly sampling from this density is difficult or impossible, we can sample from the 
joint posterior density P(Q, s|d), where s is the CMB signal map. Expectation values of 
any statistic of the Ce will converge to the expectation values of P(C(\d). The theory of 
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Gibbs sampling (Gelfand and Smith 1990; Tanner 1996) tells us that if we can sample from 
the conditional densities P(s\Cg, d) and P(Cg\s, d) then iterating the following two sampling 
equations will, after an initial burn-in period, lead to a sample from the joint posterior 



Here the symbol >" denotes drawing a random realization from the density on the right. 
The conditional density of the signal given the most recent Ci sample is P(s|C|, d) oc 
G(S t (S t + N)~ 1 m, ((S* 1 ) -1 + iV -1 ) -1 ), where m is the map constructed from the data d. 
To sample from this density we simply generate a Gaussian variate with the required mean 
and covariance. The density for the Cg factorizes to an inverse gamma distribution due 
to the special form of S and sampling from this is very simple. For each £ we compute 
^ = Em=-i l s /mJ 2 an d a (2£-l)-vector of Gaussian random variates with zero mean and unit 
variance. Then C\ +1 = A. Given this sample from P(Q,s|d), the marginalization over s 
to obtain P(C^|d) is trivially accomplished: just ignore the s in the tuple (C^,s). Of course 
this is not to say that the s sample is purely ancillary in the analysis: it is also of interest 
to explore P(s|d) which is a full representation of the CMB signal content of the data. 

The scaling of this Monte Carlo method is O(N^) and corresponds to the scaling of the 
spherical harmonic transforms performed by the HEALPix algorithm. Other issues which 
affect our computational cost are the choice of a good preconditioner for our linear algebra 
solver, the correlation length between samples and the length of burn-in time required for 
the sampling scheme, eqs. 1 and 2, to converge. We perform the sampling in eq. 1, by 
means of solving linear systems of equations using the Conjugate Gradient algorithm. The 
computational cost of this method is therefore highly dependent on the choice of a good 
preconditioner for this system. We experimented with several preconditioners and found 
examples which gave convergence in a few tens of iterations. The number of samples we 
need to obtain in order to adequately describe the statistics of the power spectrum will 
depend upon the degree of correlation between successive samples. A detailed discussion of 
these issues can be found in Eriksen et al. (2004). 



P(Q,s|d): 




P(s|Q,d), 
P(Q|s l+1 ). 



(1) 
(2) 



2.1. Foregrounds 



An appealing feature of the sampling approach is its ability to incorporate virtually any 
real- world complications, as discussed by Jewell et al. (2004) and Wandelt et al. (2004). A 
few examples of this flexibility are applications to f^ 1 noise, asymmetric beams or arbitrary 
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sky coverage. Foreground estimation and removal are also possible. In principle, detailed 
prior knowledge of all anticipated foregrounds can be included and the algorithm will then 
return the level of the foreground supported by the data in the map. However, for this first 
analysis we limit ourselves to including two simple foreground components: (1) a stochastic 
model of the monopole and dipole in the maps, and (2) a stochastic model of the foreground 
component. These models are encoded in terms of uniform, improper priors that express 
our ignorance of the monopole and dipole in the maps as well as the foreground contribu- 
tions in the regions that are flagged in the foreground masks supplied by the WMAP team. 
Traditionally this masked region is excluded from the analysis altogether. In the Bayesian 
approach it is modeled as a region where the foregrounds are completely unknown. Sam- 
pling from the posterior density reconstructs the signal in the masked region, based on the 
correlation structure discovered on the unmasked portion of the sky. The details of these 
foreground models are outlined in the above papers as well as in Eriksen et al. (2004). 



3. Implementation 

We used two independently developed implementations of the algorithm to produce our 
results: MAGIC (Wandelt 2004) and Commander. These are both parallel implementations 
of the sampling algorithm. Having results from both MAGIC and Commander allowed us 
to cross-check results and provided valuable redundancy. Our tests of the codes are detailed 
in Eriksen et al. (2004) and these showed that both codes were producing results consistent 
with each other. In both codes our noise model for each channel consisted of combining the 
published number of observations per pixel with the noise per sample for each channel and 
assuming the noise to be uncorrelated. We also show in Eriksen et al. (2004) that residual 
correlated noise in the WMAP maps has negligible effect on our results. 

The data input to these codes were the first year WMAP maps 1 for all 8 of the cosmo- 
logically interesting frequency bands (2 at Q-, 2 at V- and 4 at W-band). For all channels we 
used template corrected maps following Bennett et al. (2003a). We performed two analyses 
with these data. First we analyzed the data in the V and W channels separately by band, 
using unweighted averages of the channel maps and computing an effective beam window 
function. For these runs we chose the conservative KpO mask and ran single, long chains, 
containing 1000 samples for each band. Q-band was analyzed more aggressively, combining 
the channels on the likelihood level (Eriksen et al. 2004), and using the Kp2 mask to as- 
sess potential foreground contamination. These analyses will be labeled Q, V and W in the 
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following. 

The second analysis uses the optimal combination of all 8 channels in the likelihood 
using their respective noise specifications and beam window functions. We will refer to 
this joint analysis as QVW in the following. As before, we specified a stochastic model for 
the monopole and dipole component as well as complete ignorance about the foregrounds 
within a region on the sky. For the QVW analysis this region was defined by the more 
aggressive Kp2 mask. Both the Q and QVW analyses were done in parallel, short chains 
(~10 chains, ~100 samples each) initialized with plausible starting spectra inspired by the 
WMAP analysis (Hinshaw et al. 2003). 

4. Results 

All of the results presented here are based on ~ 1000 samples for each frequency band, 
which required a total of ~15,000 hours of CPU time 

Figure 1 shows the Bayes estimate (posterior mean) for the signal QVW analysis. This 
map represents the CMB signal content in the Q, V, and W bands of the WMAP data. This 
map shows detail where the data warrants it and smoothes where the data is poor. The 
CMB signal is clearly visible outside the Galaxy and the algorithm has reconstructed the 
signal in the Galactic plane for the largest scale modes, consistent with the signal phases 
and correlations it has discovered on the high latitude sky. There is insufficient information 
in the map for the algorithm to reconstruct the smaller scale modes in the galactic plane. 
This result is a non-linear generalization of the Wiener filter which does not assume prior 
information about the signal correlations but discovers them from the data. 

Figure 2 shows the power spectra we obtain for Q-, V- and W-band and the combined 
QVW analysis. Large scale features in the power spectra are consistently reproduced across 
all the channels. 

In Figure 3, we show the probability distribution of the Ce samples for some I of interest. 
Since the original WMAP analysis was released there has been some controversy regarding a 
lack of power at the largest angular scales (lowest ts) e.g. (Efstathiou 2004; Slosar, Seljak, & 
Makarov 2004). It can be seen from the plots that our estimates of these Q's are consistent 
with the original determinations made by the WMAP team, but that there are significant, 
non-Gaussian uncertainties on our estimates which indicate that low values of Cg for the 
largest angular scales are not so unexpected. We also over-plot the results using the WMAP 
internal linear combination map with the Kp2 mask from Efstathiou (2004) in our figure. It 
can be seen that there is good agreement between the values obtained in our analysis and 
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that of Efstathiou. 

For easy reference we show the cumulative probability distributions for the lowest £ in 
Figure 4. One can read off the probability that the actual theory C e are lower than the 
prediction from any particular model. Probabilities close to or 1 indicate a poor fit, or 
outlier. A search over all strongly signal dominated £ from 2 to 350 using our V and W 
band analyses resulted in 10 outliers (within 0.01 of or 1) for V band (£ =54, 73, 114, 
117, 121, 179, 181, 209, 300, 322) and 9 outliers for W band (£=73, 82, 117, 121, 181, 261, 
334, 341, 344). For 349 tests this corresponds to only a slightly higher number of outliers 
than expected based on counting statistics, and is entirely unremarkable if we only count 
those outliers that appear in both bands. Based on this £-by-£ analysis the "bite" in the 
spectrum does not seem particularly extraordinary, with only one outlier in the range of 200- 
220. However, all C e at 205 < £ < 210 are quite far below the best fit and will most likely 
be noted as a highly significant outlier in a full multivariate analysis of the joint posterior 
density. 



5. Discussions and Conclusions 

We have implemented the Gibbs sampling technique introduced by Jewell et al. (2004) 
and Wandelt et al. (2004), and applied it to the WMAP data. This first presentation of our 
results focuses on the inferred marginalized likelihoods for each £. We find that these are 
broadly consistent with previous determinations of the power spectrum. By virtue of using 
a Bayesian analysis we can present present a more complete picture of the uncertainties in 
the estimates across all relevant scales. Previous likelihood-based work to generate a full 
probabilistic description (e.g. Slosar, Seljak, & Makarov (2004)) were limited to looking at 
only the lowest £ due to the computational difficulty involved. We present an analysis that 
covers the region from £ = 2 to £ = 500. A multivariate treatment will therefore allows a 
rigorous assessment of the significance of anomalies in the power spectrum such as the low 
power on large angular scales, the "bite" in the power spectrum near the peak, etc., as well 
as using the full multivariate posterior density as the input to parameter estimation. This 
is particularly interesting since the Bayesian approach yields a converging analytic approx- 
imation to the likelihood of Cg given the data, it is not necessary to construct parametric 
approximations to the likelihood such as the Gaussian or shifted log-normal approximations. 
A detailed study of these issues will be presented in a future publication. 

More information can be drawn from the data in the future. The extension to more 
complex foreground models, the inclusion of correlated noise and the addition of polarization 
to the analysis will all be of great interest, especially in view of the upcoming release of the 
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second year of WMAP data and, further in the future, the Planck experiment. 
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Fig. I. — The posterior mean (s)p( s | d ) of the CMB signal s given the WMAP first year data, 
including all 8 channels in the Q, V and W bands. This map represents the CMB signal 
content of the WMAP data. It contains detail where the data warrants it and smoothes 
where the data is poor. For example, the algorithm is able to reconstruct the largest scale 
modes in the Galactic plane based on the signal phases and correlations it discovers in regions 
of the map outside the Galactic plane. The smallest scale which can be reproduced in the 
Galactic plane is set by the mask size and is a natural consequence of Wiener filtering. 

Fig. 2. — Four power spectra obtained from the Gibbs sampling algorithm. The uppermost 
panel shows the Q-band power spectrum, then V-band, W-band and finally the lower panel 
the combined analysis from all eight cosmologically interesting WMAP channels. Gray 
dots represent the individual samples and the dark line is the mode of the unbinned power 
spectrum. The lighter gray line in the bottom panel is the WMAP combined CMB power 
spectrum and where the line is not visible the difference between our result and the original 
WMAP result differ by less than the width of the line. The Q and Q+V+W result was 
obtained with an initial power spectrum based on the WMAP best-fit power spectrum with 
a random component, while the V and W results were obtained using an initial spectrum 
proportional to 1/ £{£+!). Our V and W results were obtained using the conservative 
KpO mask while our Q-band result uses the Kp2 mask. This was done to assess potential 
foreground contamination, which we find no evidence for. 

Fig. 3. — The probability distributions of the Ce samples for a selected set of £ giving 
the probability of obtaining a value of Ci within a given histogram bin. We plot re- 
sults for the significantly non-Gaussian \ow-£ multipoles and for selected higher values 
based on their deviation from the best-fit A-CDM model. There are 50 bins in each his- 
togram. Red, green and blue histograms are Q, V and W-band respectively. Black is 
the combined QVW analysis. The dotted vertical line is the WMAP best estimate of the 
Ce value. The solid line is the WMAP best fit theory Cgs and the dash-dot-dot line is 
the average of the samples from our algorithms. For £ < 7 the dashed lines show the 
WMAP-ILC values from Efstathiou (2004) for comparison. Further plots are available at 
http:/ /www. astro. uiuc.edu/~iodwyer/research#wmap. 

Fig. 4. — Easy-to-use representation for evaluating theoretical models of the power spectrum 
at the lowest £. We plot the probability P(C^ heory < Ce) against £(£+ l)Ce/2n for the largest 
angular scales probed by WMAP. Theorists can evaluate the goodness of fit of their model 
Cg with the WMAP data by reading off the probability from this graph. If P is within 0.025 
of 1 or 0, the model would be ruled out at the 5% level based on that £ alone. The vertical 
dotted line is the WMAP best fit theory assuming a constant scalar spectral index n s . The 
probability is ~90% that the the actual theory C2 is smaller. 
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